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Abstract 


A natural measure of smoothness of a Boolean function is its sensitivity (the largest number of Ham¬ 
ming neighbors of a point which differ from it in function value). The structure of smooth or equivalently 
low-sensitivity functions is still a mystery. A well-known conjecture states that every such Boolean func¬ 
tion can be computed by a shallow decision tree. While this conjecture implies that smooth functions are 
easy to compute in the simplest computational model, to date no non-trivial upper bounds were known 
for such functions in any computational model, including unrestricted Boolean circuits. Even a bound 
on the description length of such functions better than the trivial 2" does not seem to have been known. 

In this work, we establish the first computational upper bounds on smooth Boolean functions: 

• We show that every sensitivity s function is uniquely specified by its values on a Hamming ball of 
radius 2s. We use this to show that such functions can be computed by circuits of size n 0 ^. 

• We show that sensitivity s functions satisfy a strong pointwise noise-stability guarantee for random 
noise of rate 0(1/s). We use this to show that these functions have formulas of depth 0(s log n). 

• We show that sensitivity s functions can be (locally) self-corrected from worst-case noise of rate 


exp(— O(s)). 


All our results are simple, and follow rather directly from (variants of) the basic fact that that the 
function value at few points in small neighborhoods of a given point determine its function value via 
a majority vote. Our results confirm various consequences of the conjecture. They may be viewed as 
providing a new form of evidence towards its validity, as well as new directions towards attacking it. 


1 Introduction 


1.1 Background and motivation 

The smoothness of a continuous function captures how gradually it changes locally (according to the metric 
of the underlying space). For Boolean functions on the Hamming cube, a natural analog is sensitivity, 
capturing how many neighbors of a point have different function values. More formally, the sensitivity of 
a Boolean function / : {0,1}" —> {0,1} at input x € {0,1}", written s(f, x ), is the number of neighbors 
y of x in the Hamming cube such that f(y) f fix). The max sensitivity of /, written s(f) and often 
referred to simply as the “sensitivity of /”, is defined as s(/) = max^p ip s(f,x). So, 0 < s(f ) < n, 
and while not crucial, it may be good for the reader to consider this parameter as “low” when e.g. either 
s(f ) < (log n)°^ or s(f) < n°^ (note that both upper bounds are closed under taking polynomials). 

To see why low-sensitivity functions might be considered smooth, let <5(-, -) denote the normalized Ham¬ 
ming metric on {0, l} n . A simple application of the triangle inequality gives 

Ey:5(x,y)=6 0 \f(x) - f(y) I < 5 0 s(f). 

Thus s(f ) might be viewed as being somewhat analogous to the Lipschitz constant of /. 

A well known conjecture states that every smooth Boolean function is computed by a shallow decision 
tree, specifically of depth polynomial in the sensitivity. This conjecture was first posed in the form of a 
question by Nisan [Nis91] and Nisan and Szegedy [NS94] but is now (we feel) widely believed to be true: 

Conjecture 1. [Nis91, NS94] There exists a constant c such that every Boolean function f has a decision 
tree of depth s(f) c . 

The converse is trivial, since every Boolean function computable by a depth d decision tree has sen¬ 
sitivity at most d. However, the best known upper bound on decision tree depth in terms of sensitivity is 
exponential (see Section 1.3). 

A remarkable series of developments, starting with Nisan’s paper [Nis91], showed that decision tree 
depth is an extremely robust complexity parameter, in being polynomially related to many other, quite di¬ 
verse complexity measures for Boolean functions, including PRAM complexity, block sensitivity, certificate 
complexity, randomized decision tree depth, quantum decision tree depth, real polynomial degree, and ap¬ 
proximating polynomial degree. Arguably the one natural complexity measure that has defied inclusion in 
this equivalence class is sensitivity. Thus, there are many equivalent formulations of Conjecture 1 ; indeed, 
Nisan originally posed the question in terms of sensitivity versus block sensitivity [Nis91]. See the extensive 
survey [HKP11] for much more information about the conjecture and [BdW02] for background on various 
Boolean function complexity measures. 

Conjecture 1 is typically viewed as a combinatorial statement about the Boolean hypercube. However, 
the conjecture also makes a strong assertion about computation, stating that smooth functions have very 
low complexity; indeed, the conjecture posits that they are easy to compute in arguably the simplest com¬ 
putational model — deterministic decision trees. This implies that smooth functions easy for many other 
“low-level” computational models via the following chain of inclusions: 

DecTree-depth(poly(s)) C DNF-width(poly(s)) C AQ)-size(n poly ^) 

C Formula-depth(poly(s) log(n)) C Circuit-size(ra poly ^). 

Given these inclusions, and the widespread interest that Conjecture 1 has attracted in the study of 
Boolean functions, it is perhaps surprising that no non-trivial upper bounds were previously known on 
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low sensitivity functions in any computational model, including unrestricted Boolean circuits. Indeed, a 
pre-requisite for a family of functions to have small circuits is an upper bound on the number of functions 
in the family, or equivalently on the description length of such functions; even such bounds were not previ¬ 
ously known for low sensitivity functions. This gap in our understanding of low sensitivity functions helped 
motivate the present work. 

An equivalent formulation of Conjecture 1 is that every sensitivity s function is computed by a real 
polynomial whose degree is upper bounded by some polynomial in s. This is equivalent to saying that the 
Fourier expansion of the function has degree poly(s): 

Conjecture 2. [Nis91, NS94] (Equivalent to Conjecture 1) There exist a constant c such that every Boolean 
function is computed by a real polynomial of degree s(f) c . 

Given the analogy between sensitivity and the Lipschitz constant, this form of the conjecture gives a 
natural discrete analog of continuous approximations of smooth Lipschitz functions by low-degree poly¬ 
nomials, first obtained for univariate case by Weierstrass [Wei85], which has had a huge influence on the 
development of modern analysis. This lead to a large body of work in approximation theory, and we mention 
here the sharp quantitative version of the theorem [Jac30] and its extension to the multivariate case [NS64]. 

This formulation of the conjecture is also interesting because of the rich structure of low-degree poly¬ 
nomials that low sensitivity functions are believed to share. For instance, low-degree real polynomials on 
the Boolean cube are easy to interpolate from relatively few values (say over a Hamming ball). The inter¬ 
polation procedure can be made tolerant to noise, and local (these follow from the fact that low-degree real 
polynomials also have low degree over F 2 ). Again, our understanding of the structure of low sensitivity 
functions was insufficient to establish such properties for them prior to this work. 

Finally, to every Boolean function / one can associate the bipartite graph Gf which has left and right 
vertex sets / -1 (0) and / _1 (1), and which has an edge (x. y) if the Hamming distance d(x,y) is 1 and 
/(.x) f f(y). A function has max sensitivity s if and only the graph Gf has maximum degree at most s. 
From this perspective one can view Conjectures 1 and 2 as a step towards understanding the graph-theoretic 
structure of Boolean functions and relating it to their computational and analytic structure (as captured by 
the Fourier expansion). In this paper, we propose proving various implications of the conjecture both as a 
necessary first step towards the conjecture, and as a means to better understanding low sensitivity functions 
from a computational perspective. 

1.2 Our Results 

Let F(s,n) denote the set of Boolean functions on n variables such that s(f) < s. We sometimes refer to 
this class simply as “sensitivity s functions” (n will be implicit). 

The starting point for our results is an upper bound stating that low-sensitivity functions can be interpo¬ 
lated from Hamming balls. This parallels the fact that a degree d polynomial can be interpolated from its 
values on a Hamming ball of radius d. 

Theorem 3. Every sensitivity s function on n variables is uniquely specified by its values on any Hamming 
ball of radius 2s in {0,1 } T \ 

The simple insight here is that knowing the values of / at any set of 2s + 1 neighbors of a point x 
uniquely specifies the value of / at x\ it is the majority value over the 2s + 1 neighbors (else the point x 
would be too sensitive). This implies the following upper bound on the number of sensitivity s functions: 

\F(s, n )| < 2^ 2s ). 
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Our proof of Theorem 3 is algorithmic (but inefficient). We build on it to give efficient algorithms that 
compute / at any point x £ {0, l} n , given the values of / on a Hamming ball as advice. 

Our first algorithm takes a bottom-up approach. We know the values of / on a ball of small radius 
around the origin, and wish to infer its value at some arbitrary point x. Imagine moving the center of the ball 
from the origin to x along a shortest path. The key observation is that after shifting the ball by Hamming 
distance 1, we can recompute the values of / on the shift using a simple Majority vote. 

Our second algorithm uses a top-down approach, reducing computing / at x to computing / at O(s) 
neighboring points of Hamming weight one less than x. We repeat this till we reach points of weight O(s) 
(whose values we know from the advice). By carefully choosing the set of O(s) neighbors, we ensure that 
no more than n°^ values need to be computed in total: 

Theorem 4 . Every sensitivity s function is computed by a Boolean circuit of size 0(sn 2s+1 ) and depth 
0(n s ). 

Simon has shown that every sensitivity s function depends on at most 2° <s> variables [Sim82]. Thus, 
the circuit we construct has size at most 2 <>(s h 

A natural next step would be to parallelize this algorithm. Towards this goal, we show that low sensitivity 
functions satisfy a very strong noise-stability guarantee: Start at any point x £ {0, l} n and take a random 
walk of length n/lOs to reach a point y. Then f(x ') = f(y) with probability 0.9, where the probability is 
only over the coin tosses of the walk and not over the starting point x. Intutitively, this says that the value of 
/ at most points in a ball of radius n/lOs around x equals the value at x (note that in contrast, Theorem 3 
only uses the fact that most points in a ball of radius 1 agree with the center). We use this structural property 
to get a small depth formula that computes /: 

Theorem 5. Every sensitivity s function is computed by a Boolean formula of depth 0(s log n) and size 

n°W. 

(By [Sim82], these formulas have depth at most 0{s 2 ) and size at most 2°^ 2> as before.) At a high 
level, we again use the the values on a Hamming ball as advice. Starting from some arbitrary input x, we 
use a variant of the noise-stability guarantee (which holds for “downward” random walks that only flip 1- 
coordinates to 0) to reduce the computation of / at x to computing / on 0(1) many points whose weight 
is less than that of a: by a factor of roughly (1 — l/(10s)) (a majority vote on these serves to amplify the 
success probability). Repeating this for each of these new points, recursively, for 0(slog(n)) times, we 
reduce computing / at x to computing / at various points in a small Hamming ball around the origin, which 
we know from the advice. 

We also show that low-sensitivity functions admit local self-correction. The setup here is that we are 
given oracle access to an unknown function r : {0, l} n —> {0,1} that is promised to be close to a low 
sensitivity function. Formally, there exists a sensitivity s function / : {0, l} n —y {0,1} such that 

S(r, f) := Pr [r(x) + f(x)} < 2~ ds 

a;G{0,l}" 

for some constant d. We are then given an arbitrary x £ {0, l} n as an input, and our goal is to return f(x) 
correctly with high probability for every x, where the probability is over the coin tosses of the (randomized) 
algorithm. We show that there is a self-corrector for / with the following guarantee: 

Theorem 6. There exist a constant d such that the following holds. Let r : {0, l} n —» {0,1} be such that 
(){r. /) < 2 ~ ds for some sensitivity s function f. There is an algorithm which, when given an oracle for r 
and x £ {0, l} n as input, queries the oracle for r at ( n/s)°^ points, runs in ( n/s)°^ time, and returns 
the correct value of f(x) with probability 0.99. 
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Our self-corrector is similar in spirit to our formula construction: our estimate for fix) is obtained by 
taking the majority over a random sample of points in a ball of radius n/10s. Rather than querying these 
points directly (since they might all be incorrect for an adversarial choice of x and r), we use recursion. 
We show that 0(slog(n)) levels of recursion guarantee that we compute f(x) with good probability. The 
analysis uses Bonami's hypercontractive inequality [0’D14]. 

Our results imply that low-degree functions and low sensitivity functions can each be reconstructed from 
their value on small Hamming balls using simple but dissimilar looking “propagation rules”. We show how 
degree and sensitivity can be chracterized by the convergence of these respective propagation rules, and use 
this to present a reformulaion of Conjecture 1. 

1.3 Related Work 

The study of sensitivity originated from work on PRAMs [CDR86, Sim82]. As mentioned earlier, the 
question of relating sensitivity to other complexity measures such as block sensitivity was posed in [NS94]. 
There has been a large body of work on Conjecture 1 and its equivalent formulations, and recent years have 
witnessed significant interest in this problem (see the survey [HKP11] and the papers cited below). To date, 
the biggest gap known between sensitivity and other measures such as block-sensitivity, degree and decision 
tree depth is at most quadratic [Rub95, AS11]. Upper bounds on other measures such as block sensitivity 
and certificate complexity in terms of sensitivity are given in [KK04, ABG + 14, AP14, APV15] (see also 
[AV15]). Very recently, a novel approach to this conjecture via a communication game was proposed in the 
work of Gilmer et al. [GKS15]. 

1.4 Preliminaries 

We define the O-sensitivity, 1-sensitivity and the max sensitivity of an n-variable function / as 

so(/) = max s(f,x), si(/) = max s(f,x), s(f) = max s(f,x)=max(s 0 (f),s 1 (f)). 

xef~ x ( o) rrey-Vi) xe{o,iU 

We denote the real polynomial degree of a function by deg(/) and its F 2 degree by deg 2 (/). We write 
wt(af) for x <G {(). I}" to denote the Hamming weight of x (number of ones). We write 6( f, q) for f, q : 
{0, l} n ->• {0,1} to denote Pr xe{0j i } n [f(x) f g(x)]. 

For x £ {0,1}”, let B(x, r ) C {0, l} n denote the Hamming ball consisting of all points at distance at 
most r from x. Let S(x. r) denote the Hamming sphere consisting of all points at distance exactly r from 
x. Let N(x) denote the set of Hamming neighbors of x (so N(x) is shorthand for <S(x. 1)), and let N r (x) 
denote the set of neighbors of Hamming weight r (points with exactly r ones). 

The following upper bound on sensitivity in terms of degree is due to Nisan and Szegedy. 

Theorem 7. [NS94] For every function f : {0, \} n —> {0,1}, we have s (/) < 4(deg(/)) 2 . 

We record Simon’s upper bound on the number of relevant variables in a low-sensitivity function: 

Theorem 8. [Sim82] For every function f : {0, l} n —y {0,1}, the number of relevant variables n' is 
bounded by n' < s(f) 4 S ^. 
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2 Structural properties of low sensitivity functions 

2.1 Bounding the description length 

We show that functions with low sensitivity have concise descriptions, so consequently the number of such 
functions is small. Indeed, we show that knowing the values on a Hamming ball of radius 2s + 1 suffices. 

2.1.1 Reconstruction from Hamming balls and spheres. 

The following simple but key observation will be used repeatedly: 

Lemma 9. Let S C N(x) where |5| > 2s + 1. Then f(x) = Maj ye5 (/(y)). 

Proof: Let b G {0,1} denote the majority value of / over S and let 5 6 C .S' he the subset of S over which 
/ takes the value b. Note that \S b \ > |~|5|/2~| > s + 1 since |5| > 2s + 1. If f(x) f b, then every vertex in 
S b represents a sensitive neighbor of x, and thus s(/, x) > s + 1 which is a contradiction. ■ 

Theorem 10. Every sensitivity s function is uniquely specified by its values on a ball of radius 2s. 

Proof: Suppose that we know the values of / on B(x, 2s). We may assume by relabeling that x = 0” is the 
origin. Note that £>(0”, 2s) is just the set of points of Hamming weight at most 2s. 

We will prove that / is uniquely specified on points x where wt(.x) > 2s by induction on r = wt(x). 
The base case r = 2s is trivial. For the induction step, assume we know / for all points of weight up to r 
for some r > 2s. Consider a point x with wt(x) = r + 1. The set N r (x) of weight-r neighbors of x has 
size r + 1 > 2s + 1. Hence 


f(x) = Maj 

y&N r (x) 


( 1 ) 


by Lemma 9. ■ 

Note that by Equation 1 , we only need to know / on the sphere of radius r rather than the entire ball to 
compute / on inputs of weight r + 1. This observation leads to the following sharpening for s < n/4. 

Corollary 11. Let s < n/4. Every sensitivity s function is uniquely specified by its values on a sphere of 
radius 2s. 

Proof: As before we may assume that x = 0 n . By Equation 1, the values of / on 5(0”, r) fix the values at 
5(0 n , r + 1). Hence knowing / on 5(0”, 2s) suffices to compute / at points of weight 2s + 1 and beyond. 
In particular, the value of / is fixed at all points of weight n/2 through n (since 2s < n/2). Hence the value 
of / is fixed at all points of the ball 5(1”, 2s), and now Theorem 10 finishes the proof. ■ 


2.1.2 Upper and lower bounds on J/s, n). 

Recall that ^(sjn)! denotes the number of distinct Boolean functions on n variables with sensitivity at 
most s. We use the notation (<,) to denote f2i=o (") > the cardinality of a Hamming ball of radius k. 

As an immediate corollary of Theorem 10, we have the following upper bound: 

Corollary 12. For all s < n, we have ^(s,?!)! < 2^< 2s \ 
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We have the following lower bounds: 

Lemma 13. For all s < n, we have |J r (s,n)| > max ^(™)2 2S_1 , (n — s + 1) 2S ^ . 

Proof: The first bound comes from considering s-juntas. We claim that there are at least 2 2 -1 functions 
on s variables that depend on all s variables. For any function / : {0, I } s —y {0,1} on s variables, either / 
or f = f © n i= x, is sensitive to all s variables. This is because / ® /' = n-=i x*. hence one of them 
has full degree as a polynomial over F 2 , and hence must depend on all n variables. The bound now follows 
by considering all subsets of n variables. 

The second bound comes from the addressing functions. Divide the variables into s — 1 address variables 
yi ,..., y s -i and n — s + 1 output variables xi,..., x n - s +i. Consider the addressing function computed by 
a decision tree with nodes at the first s — 1 levels labelled by yi, ..., y s -i and each leaf labelled by some Xi 
(the same Xi can be repeated at multiple leaves). It is easy to check that this defines a family of sensitivity s 
functions, that all the functions in the family are distinct, and that the cardinality is as claimed. ■ 

In the setting when s = o(n), the gap between our upper and lower bounds is roughly 2 nS versus re 2 \ 
The setting where s = 0( log(n)) is particularly intriguing. 

Problem 14. Is |J r (2 log(n), n)| = 2 n “ (1) ? 


2.2 Noise Stability 

We start by showing that functions with small sensitivity satisfy a strong noise-stability guarantee. 

For a point x € {0, l} n and 5 € [0,1], let N 1 _ 2 ,s(x) denote the <5-noisy version of x, i.e. a draw 
of y ~ Ni_ 25 (x) is obtained by independently setting each bit y, to be x, with probability 1 — 25 and 
uniformly random with probability 25. The noise sensitivity of / at x at noise rate 5, denoted NS^[/](x), is 
defined as 

NS 5 [/](*)= Pr [f(x)?f(y)\. 

y~Ni_ 2 «(x) 

The noise sensitivity of / at noise rate 5, denoted NS^f/], is then defined as 


NS*[/]= E [NS *[/](*)] 

o;~{0,l} 71 


Pr 


[/(®) + 


The next lemma shows that low-sensitivity functions are noise-stable at every point x € {0, l} n : 

Lemma 15. Let f : {0, l} n —> {0,1} have sensitivity s. For every x € {0, l} n and 0 < 5 < 1/2, we have 
NS 5 [/](x) < 25s. 

Proof: Let t G [n]. Consider a random process that starts at x and then flips a uniformly random subset 
T C [n] of coordinates of cardinality t, which takes it from x to y G {0,1We claim that Pr x[f{x) / 
f(y)\ < ■ To see this, we can view going from x to y as a walk where at each step, we pick the 

next coordinate to walk along uniformly from the set of coordinates that have not been selected so far. Let 
x = X(), x i ,... ,xt = y denote the sequence of vertices visited by this walk. At x t , we choose the next 
coordinate to flip uniformly from a set of size n — i. Since Xi has at most s sensitive coordinates, we have 
Pr [f(xi) / f(xi+i)] < Hence by a union bound we get 


t -1 


t -1 


Pr[/(xo) + f(xt)] < YI Fv if( x i) + /(xi+i)] < Y —■ - 

‘—J n. — 7 . 


St 


i=0 


i=0 


n — i n — f + 1 
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as claimed. 

Now we turn to noise sensitivity. We can view a draw of y ~ N | _ 2 g (x) as first choosing the number t 
of coordinates of x to flip according to the binomial distribution Bin(n, 5), and then flipping a random set 
T C [n] of size t. From above, we have Pr[/(y) ^ f(x) \ \T\ = t] < n j£ +1 • Hence 


Pr \f(x)^f(y)]< E 


St 


= s 


n — t + 1 


= s 




n—t 


t= 1 


n 


t J n — t + 1 


\n—t 


t =1 


n 

t - 1 


s5 

1^5 

s5 

1-5 


n —1 




n—t 


t '=0 


/ / n 


(1 - 5 n ) 


which is less than 25s for 5 < 1/2. 

We can restrict the noise distribution and get similar bounds. The setting that we now describe, where we 
only allow walks in the lower shadow of a vertex, will be useful later when we construct shallow formulas 
for a low sensitivity function /. 

Let D(x,t) denote the points in the lower shadow of x at distance t from it (so a point in D(x,t) is 
obtained by flipping exactly t of the bits of x from 1 to 0). We show that a random point in I)(x. t) is likely 
to agree with x (for t < wt(x)/2s). 

Lemma 16. Let wt(x) = d>s. Then ifs(f) < s, we have Pr yeD (x,t)[f (x) ± f(y)} < ~ v 

Proof: We consider a family of random walks that we call downward walks. In such a walk, at each step 
we pick a random index that is currently 1 and set it to 0. Consider a downward walk of length t and 
let x = xq,x\,, xt = y denote the sequence of vertices that are visited by the walk. We claim that 
Pr[/(xj) / f{xi- |_i)] < -At. To see this observe that out of the d — i possible 1 indices in x, that could be 
flipped to 0, at most s are sensitive. Hence we have 


Pr[/(x 0 ) + /(*t)] A f{x i+ 1 )] < E 

i =0 2—0 

Since y = xt is a random point in D(x, t ), the proof is complete. ■ 

Corollary 17. Let wt(x) = d and t < d/(10s + 1). Then Pr y <= D ( x ,t)[f(y) / f( x )} ^ 1/10. 

2.3 Bias and Interpolation 

It is known that low sensitivity functions cannot be highly biased. For / : {0, l} n —> {0,1}, let 
Mf) = Pr [/(s) = 0], in (/) = Pr lf(x) = 1], 

i€{0,l} n a:G{0,l}" 

f(/) = min(// 0 (/)j Mi(/)) 
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Lemma 18. For f : {0, \} n —»• {0,1} we have 

s 0 (f) > iog 2 ifM) > 0 

Si(f) > log 2 if F \(/) > 0. 

Equality holds iff the set f~ 1 (b ) is a subcube. 

We note that these bounds are implied by the classical isoperimetric inequality, which in fact shows that 
Exe/-i(fc) [ s (/> x )\ — log(1////,(/)) for b = 0,1. We present a simple inductive proof of the max-sensitivity 
bounds given by Lemma 18 in the appendix. 

We say that a set it' C {0,1 } n hits a set of functions T if for every / £ T, there exists x £ K such 
that f(x) f 0. We say that K interpolates T if for every /i / / 2 £ F, there exists x £ K such that 
fi(x) / fffx). 

Corollary 19. Let k > C‘2 l!i and let S be a random subset of { 0, l} n obtained by taking k points 
drawn uniformly from {0, l} n with replacement. The set S interpolates F(s,n) with probability 1 — 
exp(— )) (over the choice of S). 

Proof: We first show that large sets hit F(t,n) with very high probability. Fix / £ F(t, n). Since 
we have pi(f) > 2 _t by Lemma 18, the probability that k random points all miss / -1 (1) is bounded 

by (1- 2 t ) k < exp(—k/2 t ). By Corollary 12 we have (F(t,n) < so by the union bound, the 

probability that S does not hit every function in this set is at most 2^ <2 '^ exp(— k/2 t ), which is exp(— ( < ^ t )) 
for k > C2 l ( ff 2f ) ■ 

Next, we claim that if S hits T(2s, n) then it interpolates lF(s, n). Given functions /i, / 2 € F(s , n), let 
g = j\ © / 2 . It is easy to see that g £ J r (2s, n). and that p _1 (l) is the set of points x where fi(x) / 2 (x), 
so indeed if S hits J 7 (2.s'. n) then it interpolates F(s. n). Given this, the corollary follows from our lower 
bound on k, taking t = 2s. ■ 


3 Efficient algorithms for computing low sensitivity functions 

3.1 Small circuits 

In this subsection, we will prove Theorem 4. Recall that the proof of Theorem 10 gives an algorithm to 
compute the truth table of / from an advice string which tells us the values on some Hamming ball of 
radius 2s + 1. In this section we present two algorithms which, given this advice, can (relatively) efficiently 
compute any entry of the truth table without computing the truth-table in its entirety. This is equivalent to a 
small circuit computing /. We first give a (non-uniform) “bottom-up” algorithm for computing / at a given 
input point x £ {0, l} n . In the appendix we describe a “top-down” algorithm with a similar performance 
bound. 

3.1.1 A Bottom-Up Algorithm 

The algorithm takes as advice the values of / on £>(0 n ,2s). It then shifts the center of the ball along a 
shortest path from 0” to x, computing the values of / on the shifted ball at each step. This computation is 




made possible by a lemma showing that when we shift a Hamming ball B by a unit vector to get a new ball 
B', points in B' either lie in B or are adjacent to many points in B, which lets us apply Lemma 9. 

Let 1(5') denote the indicator of S C [n] and SAT denote the symmetric difference of the sets S, T. 
For B C {0,1 } n we write B © e,; to denote the pointwise shift of B by the unit vector e,. 

Lemma 20. For any y E B{x © ej,r) \ B(x,r), we have \N(y) n B(x,r)\ = r + 1. 

Proof: Fix any such y. Since B(x © e*, r) = B(x, r ) © e;, we have that 

y = x' © e.i for some x' E B(x , r), where 
x' = x © 1(5) for some 5 C [n], |5| < r, and hence 
y = x © 1(5A{?'}). 

If * E 5 or \S\ < r — 1, then |5A{i}| < r; but this means that y E £>(x, r), which is in contradiction to 
our assumption that y E £>(xffiej, r) \B(x, r). Hence i ^ 5 and |5| = r. But then we have y©e.,- G B(x, r) 
for precisely those j that belong to 5 U { i }. which gives the claim. ■ 

Corollary 21. Knowing the values of f on B(x, 2s) lets us compute f on B(x © e*, 2s). 

Proof: Either y G B(x © e,-, 2s) lies in B(x, 2s) so we know f(y) already, or by the previous lemma y has 
2s + 1 neighbors in B(x, 2s), in which case Lemma 9 gives f(y) = Maj y ' eN ( y )nB(x,2s)(f(y'))- ■ 

Now we can give our algorithm for computing f(x) at an arbitrary input x G {0,1 


Bottom-Up 

Advice: The value of / at all points in £>(0 n ,2s). 

Input: x G {0, l} n . 

1. Let 0 n = xo, xi, ..., Xd = x be a shortest path from 0 n to x. 

2. For i G {1 compute / on B(xi,2s) using the values at points 

in B(xi-\,2s ). 

3. Output /(xd). 


Theorem 22. The algorithm Bottom-Up computes f(x) for any input x in time 0(sn 2s+1 ) using space 
0(n 2s ). 


Proof: The values at £>(0 n ,2s) are known as advice. Corollary 21 tells us how to compute the values 
at B(xi, 2s) using the values on B(xi- 1 , 2s). If we store the values at B(xi- 1 ,2s) in an array indexed by 
subsets of size 2s, the value at any point y G B(x t . 2s) can be computed in time O(s), by performing 
2s + 1 array lookups and then taking the majority. Thus computing the values over the entire ball takes time 
0(sn 2s ), and we repeat this d < n times. Finally, at stage i we only need to store the values of / on the 
latest shift, B(xi- 1 ,2s), so the total space required is 0(n 2s ). ■ 
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3.2 Small-depth Formulas 

Theorem 22 established that any re-variable scnsitivity-.s function / is computed by a circuit of size 0(sn 2s+l ), 
but of relatively large depth 0(n 2s ). In this section we improve this depth by showing that shallow circuits 
of essentially the same size (equivalently, formulas of small depth) can compute low-sensitivity functions. 

For p < 1/2, let B(c, //) denote the product distribution over y £ {0,1} C where Pr[y ; ; = 1] = y for 
each i £ [c]. For constants 1/2 > p > 5 > 0, let c = c(p, 6) £ Z be the smallest integer constant such that 

Pr [Maj(y,;) = 1] < S. 

y~B(c,n) ie [ c j 

We now present a randomized parallel algorithm for computing f(x). 


Parallel-Algorithm 

Advice: / at all points in £>(0 n ,10s). 

Input: x £ {0, l} n . 

Let d = wt(x), t = |_d/(10s + 1)J, c = c(l/5,1/20). 

1. If d < 10s, return A(x) = f(x). 

2. Else sample yi,...,y c randomly from D(x,t) ■ Recursively run 
Parallel-Algorithm to compute A{yf) in parallel for all i£[c]. 

3. Return A(x) = Maj ie [<.](>%*)). 


For brevity we use A to denote the algorithm above and A(x) € {0,1} to denote the random variable 
which is its output on input x. For d > 10s + 1, the random choices of A in computing A(x) are described 
by a c-regular tree. The tree’s root is labeled by x and its children are labeled by y \, ... . y c \ its leaves are 
labeled by strings that each have Hamming weight at most 10s. Further, the various subtrees rooted at each 
level are independent of each other. 

Theorem 23. The algorithm runs in parallel time 0{s log re) using n () ^ s> processors. For any x £ {0, l} n , 
we have Pr.i [A[x) ^ fix)} < where Pr/i denotes that the probability is over the random coin tosses of 
the algorithm. 

Proof: We first prove the correctness of the algorithm by induction on wt(x) = d. When d < 10s, the 
claim follows trivially. Assume that the claim is true for wt(x) < d — 1, and consider an input x of weight 
d. Note that every y £ I)(x. t) has wt(y) = d — t < d — 1, hence the inductive hypothesis applies to it. For 
each i £ [c], we independently have 

Pr[A(y) ^ f(x)\ < Pr [A( yi ) ± f{yf)} + Pr [f{ yi )^f{x)\<^- + ^-<^. 

A A,m yiSD(x,t) 10 20 5 

where the 1/10 bound is by Corollary 17 and the 1/20 is by the inductive hypothesis. The algorithm samples 
c independent points y ? ; £ D(x,t), computes A(y,) for each of them using independent randomness, and 
then returns the majority of Afji) over those i £ [c]. Hence, by our choice of c = c(l/5,1/20), we have 
that Pr A [M&j ie[c] (A( yi )) ± f(x)} < 
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To bound the running time, we observe that for d > 10s + 1, 


t = 


d 


10s + 1 


>-. so d 

~ 25s 


t < d 



But this implies that in k = 0(s log d) steps, the weight reduces below 10s + 1. The number of processors 
required is bounded by c k = n°^ s \ • 

By hardwiring the random bits and the advice bits, we can conclude that functions with low sensitivity 
have small-depth formulas, thus proving Theorem 5. 


4 Self-correction 


In this section we show that functions with low sensitivity admit self-correctors. Recall that for Boolean 
functions, /, g : {0, l} 71 ->• {0,1} we write 5(f, g ) to denote Pr^^^n [f{x) + g{x)\. 

Our self-coiTector is given a function r : {0, l} n -X {0,1} such that there exists / £ J~(s, n) satisfying 
<5(r, /) < 2~ cs for some constant c > 2 to be specified later. By Lemma 18, it follows that any two 
sensitivity s functions differ in at least 2~ 2s fraction of points, so if such a function / exists, it must be 
unique. We consider two settings (in analogy with coding theory): in the global setting, the self-corrector 
is given the truth-table of r as input and is required to produce the truth-table of / as output. In the local 
setting, the algorithm has black-box oracle access to r. It is given x £ {0. l} n as input, and the desired 
output is f(x). 

At a high level, our self-corrector relies on the fact that small-sensitivity sets are noise-stable at noise 
rate 5 « 1/s, by Lemma 15, whereas small sets of density // < c~ s tend to be noise sensitive. The analysis 
uses the hypercontractivity of the Ti_ 2 < 5 (-) operator. 

Following [0’D14], for / : {0, l} n —> M, we define 


Tl-2Sf(x) 


E 


[/(</)], 


where recall that a draw of y ~ Ni_ 2 < 5 (x) is obtained by independently setting each bit yi to be x t with 
probability 1 — 25 and uniformly random with probability 25. We can view (x, y) where x ~ {0, l} n and 
y ~ Ni - 2 d(x) as defining a distribution on the edges of the complete graph on the vertex set {0,1}". We 
refer to this weighted graph as the 5-noisy hypercube. The (2, q) -Hypercontractivity Theorem quantifies the 
expansion of the noisy hypercube: 

Theorem 24. ((2, q) -Hypercontractivity.) Let f : {0, l} n —» R. Then 

\\Ti- 2 sf\\q < ll/lb far 2 < q < 1 + 

We need the following consequence, which says that for any small set S, most points do not have too 
many neighbors in the noisy hypercube that lie within S. For S C {0,1}", let us define the set A^o(S) of 
those points for which a 8 fraction of neighbors in the 5-noisy hypercube lie in S. Formally, 


A s ,g(S) = {x£ {0, l} n s.t. Pr [y £ S] > 8}. 

y~N 1 _ 2S (x) 


Abusing the notation from Section 2.3, for S C {0, l} n we write p{S) to denote Pr^pp) | y, [x £ 5]. 
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Lemma 25. We have 


Proof: Let f(x) = l(x £ S). Then 


J*(A W (S)) < 


l+2<5 


Ti- 2 sf(x) = Pr [yGS]. 

yGN 1 _ 25 (3:) 

Hence A$g(S) is the set of those x such that Ti-2sf{x) > 0. 

Let q = 2(1 + 25). It is easy to see that q satsfies the hypothesis of Theorem 24. Hence we can bound 
the q th moment of Ti- 2 < 5 / as 

E [{T 1 _25ttx)) q }<\\f\\ q 2 = ii{sy/ 2 . 
x&{0,l} n 


Hence by Markov’s inequality, 


t ~> *r \ ^ / ^{S ) q / 2 

Pr [T^sfix) >6}< ———. 

/rG{0,l} n 0 q 


The claim follows from our choice of q. ■ 

Corollary 26. If p(5) < 0 4+2 / 5 , then n(As^(S)) < fi(S) 1+5 . 

Proof: By Lemma 25, it suffices that j < /i(.S') l+r \ and it is easy to check that this condition 
holds for our choice of n(S). m 


4.1 Global Self-correction 

Our global self-corrector is given a function r : {0, l} n —> {0,1} such that there exists / £ ^{s^n) 
satisfying 5(r, f ) < 2 -ClS for some constant c\ > 2 to be specified later. By Lemma 18, it follows that any 
two sensitivity s functions differ in at least 2 ~ 2s fraction of points, so such a function / if it exists must be 
unique. Our self-corrector defines a sequence of functions /o, ■ ■ ■, /t such that fo = r and jf = / (with 
high probability). 

Global Self-corrector 

Input: r : {0, l} n —>• {0, l} n such that 5(r, f) < 2~ ClS for some /€J r (s,n). 

Output: The sensitivity-s function /. 

Let /o = r, k = C 2 s\og{n/s), 5 = l/(20s). 

For t = 1,..., k, 

For every x £ {0, l} n . 

Let f t {x ) = Maj !/ ^ Nl _ M(a . ) (/t_i (y)). 

Return f k . 

The algorithm runs in time 2°^ n \ which is polynomial in the length of its output (which is a truth table 
of size 2 n ). To analyze the algorithm, let us define the sets St for t £ {0, ..., T} as 

S t = {x £ { 0 , l} n such that f t (x ) 7 ^ f(x).} 

The following is the key lemma for the analysis. 
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Lemma 27. We have St C ^-<5,2/5 (‘S't —1 )- 

Proof: For x G St, 


/(x) / Maj (/t-i(y)), 

y~N 1 _ 2 «(x) 


hence 


Pr [/(x) ± ft-i(y)} > 


We can upper bound this probability by 


Pr [/(x) + f t -i(y )] < Pr [/(x) + f{y)] + Pr [/(y) / /t_i(y)]. 

2/~Ni_24(o:) y~N!_24(a;) S/~N!_24(x) 


Since the distributions Noise^x) and Ni(^0 are identical, we can bound the first term by Lemma 15, 
which gives 


Pr [/(x) ± f(y)\ < 2 s5 

y~Ni_25(3:) 


1 

10 ’ 


Flence 


1 


M Pr , .[/(y) + ft-i{y)} > 9 

y~Ni _2 5(1) ^ 


1 

10 


2 

5' 


But f(y) / /t_i( 2 /) implies y <G S t _i, hence by definition of A s,e(S) we have x G A (5i2 / 5 (S't_i). 
We can now analyze our global self-corrector. 


Theorem 28. There exist constants ci,c 2 such that if d(r, f ) < 2 C1S for some f G J-(s. n), then for 
k > C 2 s\og{n/s), we have fk = f■ 

Proof: Let 5 = l/(20s). Assume that there exists / G J-(s, n) such that 

<5(/) s ) = f('S'o) < 2 _C1S < (2/5) 4+40s . 


By Lemma 27 and Corollary 26, we have 

rtSt) < m ( a 6,2/!>(St-i)) < KSt-i) 1+s < y(s 0 )^\ 

Fort > c' 2 ln(n/s)/S = c 2 slog(n/s), we have 

h(S t ) < ^(S 0 ) (1+5)t < 2~ n . 


But since St C {0, l} n , it must be the empty set, and this implies that J) = /. 


4.2 Local Self-Correction 

Recall that in the local self-correction problem, the algorithm is given x G {0,1}" as input and oracle access 
to r : {0, l} n —> {0,1} such that 5(r, f ) < 2~ dlS for some constant d\ > 2 to be specified later. The goal is 
to compute f{x). Our local algorithm can be viewed as derived from the global algorithm, where we replace 
the Majority computation with sampling, and only compute the parts of the truth tables that are essential to 
computing fr(x). 
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We define a distribution Tk{x) over c-regular trees of depth k rooted at x, where each tree node is 
labelled with a point in {0, l} n . To sample a tree T\{x) from T\{x), we place x at the root, then sample c 
independent points from Ni_ 2 $(:c), and place them at the leaves. To sample a tree T/,.(.x) from Tk{x), we 
first sample T k -i{x) ~ Tk-\ (x) and then for every leaf x r G T k -i(x), we sample c independent points 
according to Ni_ 2 5 (xj), and make them the children of x,. (Note the close similarity between these trees 
and the trees discussed in Section 3.2. The difference is that the trees of Section 3.2 correspond to random 
walks that are constructed to go downward while now the random walks corresponding to the noise process 
Ni- 2 (s( - ) do not have this constraint.) 

Given oracle access to r : {0, l} n —>• {0,1}, we use the tree T k {x) to compute a guess for the value of 
f(x), by querying r at the leaves and then using Recursive Majority. In more detail, we define functions 
ro, fi,..., f k which collectively assign a guess for every node in T k . (In more detail, each r, is a function 
from L(T k {x),i ) to {0,1}, where L(T k (x),i ) is the set of points in {0, l} n that are at the nodes at depth 
k — i in Tfc(x).) For each leaf y, we let ro(y) = r(y). Once r k ~t has been defined for nodes at depth t in 
T k (x), given y at depth t — 1 in T k (x), we set r k _ t+ i(y) to be the majority of f k _ t at its children. We output 
fk(x) as our estimate for f(x). 


Local Self-corrector 

Input: x G {0, l} n , oracle for r : {0, l} n — > {0,1} such that S(r, f) < 2~ dlS 
for some f £ T{s,n). 

Output: 6 G {0,1} which equals f(x) with probability 1 —e. 

Let <5=1/(20s), c = c(l/4,e), fc£Z. 

Sample T k ~T{k,x). 

For each leaf y€T k , query r(y). 

For i = 0 to k, compute r* : L(T k (x), i) —> {0,1} as described above. 

Output fk(x). 


To analyze the algorithm, for k G Z define 



Sfc = < x G {0, l} n such that 


The following is analogous to Lemma 27 : 

Lemma 29. For k > 1 and e < 1/25, we have S k C Aj 1 / / 10 (5fc_i). 

Proof: We have r k (x) = Maj 1<? ;< c (5,) where each 5, is drawn independently according to r/,._] (Ni _ 2 r5(■'£))■ 
If x G Sk, then by our choice of c = c(l/4, e), 



[h-i(y) / f{x)] > 


On the other hand, we also have 



[h-iiy) / f{x)} < 


Pr 

3/~N 1 _ 2(5 (a:) 

Tk-i(y)~Tk-i(y) 


lf(y) + /(*)] + 


y~N 1 _ 25 (i:) 

Tk-i{y)~Tk-i{y) 


Pr [fk-i(y) + f(y)]- 
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The first term on the LHS is bounded by 1/10 by Lemma 15. Hence we have 

Pr lh_i{y) + f(y)\ > \ - 
y~N$(x) 4 lU 7 

Tk-i(y)~Tk-i(y) 

But by the definition of Sj~- 1 , 

Pr [h-i (y) + f{y)} < e ■ Pr , \y & S'fc-i] + Pr [ye S*_i] 

W~ N 1 _2«( a 0 J/~N!_ 2 «(x) {/~N!_ 2 «(x) 

Tk-i{y)~Tk-i(y) 


Hence for e < 1/25, 


- e+ J >T /^ y£Sk ~ 1 ^ 

y~N 1 _2s(x) 


Pr [y £ S k _ i] > ^ - e > 4-, 
y~N 1 _ 2S (x) 7 10 


so by the definition of As,e{S) we have x € A^/^Sfc-i). ■ 

We can now analyze our local self-corrector. 

Theorem 30. There exist constants d \. d 2 such that if 5(r, /) < 2 ~ dlS for some f £ J-(s,n), then for 
k > d, 2 S log (n/s) we have that r k (x) = f(x) with probability 0.99. The algorithm queries the oracle for r 
at ( n/s)° points. 

Proof: Let 5 = l/(20s). Let d\ > 0 be such that 

2~ dlS < ( 0 . 1 ) 4+60s . 

Assume there exists / € lF(s,n) such that 

S(r,f)< 2" rfis . 


Observe that r o(x) = r(x), so consequently p(S o) = 6(r, /). By Lemma 29 and Corollary 26, we have 

M'S*) < m(A 6,1/ioiSk-!)) < < p(S 0 )^ k . 

For k > d' 2 In (n/s)/6 = d 2 S log {n/s), we have p(St) < 2 ~ n , so St must be the empty set. But this implies 
that r k (x) = fix) except with probability e. 

The number of queries to the oracle is bounded by the number of leaves in the tree, which is c k . Setting 
e = 1/100, since c(l/4,1/100) = 0(1), this is at most c k = (n/s)°( s \ We can amplify the success 
probability to 1 — e using c(l/100, e) = 0(log(l/e)) independent repetitions. ■ 


Discussion. Every real polynomial of degree d computing a Boolean function is also a degree d polynomial 
over IF 2 - Hence, it has a natural self-corrector which queries the value at a random affine subspace of 
dimension d + 1 containing x, and then outputs the XOR of those values. Conjecture 2 implies that this 
self-corrector also works for low sensitivity functions. The parameters one would get are incomparable 
to Theorem 6 ; we find it interesting that this natural self-corrector is very different from the algorithm of 
Theorem 6 . 

We further remark that every Boolean function with real polynomial degree deg(/) < d satsifies s(f) < 
0(d 2 ) (recall Theorem 7). Thus, Theorem 6 gives a self-corrector for functions with deg(/) < d that 
has query complexity n°^ d2 \ It is interesting to note (by considering the example of parity), that this 
performance guarantee does not extend to all functions with F 2 degree deg 2 (/) < d. 
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5 Propagation rules 


We have seen that low-degree functions and low-sensitivity functions share the property that they are 
uniquely specified by their values on small-radius Hamming balls. In either case, we can use these values 
over a small Hamming ball to infer the values at other points in {0, l} n using simple “local propagation” 
rules. The propagation rules for the two types of functions are quite different, but Conjecture 2 and its con¬ 
verse given by Theorem 7 together imply that the two rules must converge beyond a certain radius. In this 
section, we discuss this as a possible approach to Conjecture 2 and some questions that arise from it. 

5.1 Low sensitivity functions: the Majority Rule 

If / : {0, \} n —> {0,1} has sensitivity s, Theorem 10 implies that given the values of / on a ball of radius 
2s, we can recover / at points at distance r + 1 > 2s + 1 from the center by taking the Majority value 
over its neighbors at distance r (see Equation (1)). It is worth noting that as r gets large, the Majority is 
increasingly lopsided: at most s out of r points are in the minority. We refer to the process of inferring /’s 
values everywhere from its values on a ball via the Majority rule, increasing the distance from the center by 
one at a time, as “applying the Majority rule”. 

For concreteness, let us conder the ball centered at 0 n . If there exists a sensitivity s function / : 
{0, l} n —>• {0,1} such that the points in B(0 n ,2s) are labelled according to /, then applying the Ma¬ 
jority rule recovers /. However, not every labeling of B{ 0 n , 2s) will extend to a low sensitivity function 
on {0,1}” via the Majority Rule. It is an interesting question to characterize such labelings; progress here 
will likely lead to progress on Question 14. An obvious necessary condition is that every point in £>(()", 2s) 
should have sensitivity at most s, but this is not sufficient. This can be seen by considering the DNF version 
of the “tribes” function, where there are n/s disjoint tribes, each tribe is of size s, and n > s 2 . (So this 
function / is an (n/s)-way OR of s-way ANDs over disjoint sets of variables.) Every x £ B(0 n , 2s) has 
s(f, x) < s — in fact, this is true for every x £ B{ 0 n , s(s — 1)) — but it can be verified that applying 
the Majority rule starting from the ball of radius 2s does recover the Tribes function, which has sensitivity 
n/s > s. Another natural question is whether there is a nice characterization of the class of functions that 
can be obtained by applying the majority rule to a labeling of B(0 n , 2s). 

5.2 Low degree functions: the Parity Rule 

It is well known that all functions / : {0, l} n —> M with deg(/) < d are uniquely specified by their values 
on a ball of radius d. This follows from the Mobius inversion formula. Again, let us fix the center to be 0 n 
for concreteness. Letting 1 (T) denote the indicator vector of the set T, the formula (see e.g. Section 2.1 of 
[Jukl2]) states that 


f(x) = Y csRxi where c s = Ec- 1 ) 151 |T| /( l ( r ))- ( 2 ) 

SC[n] ieS TC.S 

From this it can be inferred that if deg(/) < d, then for |£j > d + 1, we have 

/(!(£)) = Y (-1) |SHT|+ 7(1C0). (3) 

TcS 

We will refer to Equation (3) as the “Parity rule”, since it states that / is uncorrelated with the parity of the 
variables in S on the subcube given by (1(T) : T C 5}. We refer to the process of inferring /’s values 
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everywhere from its values on a ball of radius d via the Parity rule, increasing the distance from the center 
by one at a time, as “applying the Parity rule”. 

Given a (partial) function / : B{ 0 n , d) —» {0,1}, applying the Parity rule starting from the values of 
/ on B(0 n ,d) lets us extend / to all of {0, l} n . Note that the resulting total function / is guaranteed to 
have degree at most d, but it is not guaranteed to be Boolean-valued everywhere on {0, l} n . Indeed, an easy 
counting argument (see e.g. Lemma 31 of [MORS07]) shows that there are at most 2 d22 ~ d ■ (T,;) degree-d 

( n \ 

functions over {0, l} n , whereas the number of partial functions / : B(0 n ,d) —>• {0,1} is 2^< d> . It is an 
interesting question to characterize the set of partial functions / : B(0 n ,d) —y {0,1} whose extension by 
the Parity rule is a Boolean function. 

On the other hand, every partial function / : B(0 n , d) —t {0,1} can be uniquely extended to a total 
function / : {0, l} n —> {0,1} such that deg 2 (/) = d. This follows from the Mobius inversion formula for 
multilinear polynomials over F 2 : 

f( x ) = c s JJ Xi where cs = E HUT)) (4) 

SC[n] ies res 

where the sums are modulo 2. If deg 2 (/) < d, then cs = 0 for all S where |,5'| > d + 1. Hence by Equation 
(2), for ,3' > d + 1, we have the simple rule 

/(!(£)) = ]T f(t(T)). (5) 

TCS 

We can view this as a propagation rule for functions with deg 2 (/) < d, which extends a labeling of the ball 
25(0”, d) to the entire cube {0, l} n . If we start from a labeling of the ball which corresponds to a function 
/ : { 0 , l} n ->• ( 0 , 1 } with deg(/) < d, then Equation (5) above coincides with the Parity rule. 

5.3 When do the rules work? 

Given a partial function g : B(x o, r) —> {0,1}, we can extend it to a total function g M : {0, l} n —> {0,1} 
by applying the Majority rule (if there is not a clear majority among the neighbors queried, the value is 
underetmined). We can also extend it to a total function p Par : {0, l} n R using the Parity rule. Given 
a function / : { 0 , 1 }” —> { 0 , 1 }, and a center xq, we define a series of partial functions /|e(x 0 ,r) obtained 
by restricting / to the ball of radius r around xq. We are interested in how large r needs to be for the Parity 
and Majority rules to return the function / for every choice of center xq . Formally, we define the following 
quantities. 

Definition 31. Let r Par (/) be the smallest r such that for every xq € {0, l} n , the Parity rule applied to 
B(xo,r) returns the function f. Formally, 

r Par (/) = min { r . y XQ G {o, l} n , (/| B(x0ir) ) Par = /}. 

Similarly, let r MilJ (/) be the smallest r such that for every xq G {0, l} n , the Majority rule applied to 
B(xo,r) returns the function f. Formally, 

r Ma J(/) = minjr : Vx 0 G {0, l} n (f\ B{xo , r) ) M ^ = /}. 

It is easy to see that r Par captures the real degree of /: 


17 


Lemma 32. For all f : {0, l} n -*■ {0,1}, we have r Par (/) = deg(/). 

Proof: The inequality r Par (/) < deg(/) follows from the fact that the Parity rule correctly extends degree 
d functions starting from any Hamming ball of radius d. 

On the other hand for any center xo, running the Parity rule on /|e(x 0 ,r) * or some r < deg(/) results in a 
function (/|B(x 0 i r)) Par °f degree at most r, since the Parity rule explicitly sets the coefficients of monomials 
of degree higher than r to 0. But then it follows that (f\j 3 ( Xo ,r)) Par 7 ^ /> since their difference is a non-zero 
multilinear polynomial. ■ 

The proof above shows that quantifying over xo is not necessary in the definition of r Par (/), since for 
every xo € { 0 , l} n , we have 

r Par (/) = min{r : (/| B(a;o , r )) Par = /}. 

We now turn to the Majority rule. 

Lemma 33. For all f : {0, l} n —> {0,1}, we have r Ma i(f) = min(2 s(f),n). 

Proof: We have r Ma -i(/) < n, since B(xq, n) is the entire Hamming cube. The upper bound r Maj (/) < 
2 s(f) follows from the definiton of the Majority rule and Theorem 10. 

For the second part, we show that for every r < min(2 s(f),n), there exists a center xo such that 
(/I B(xo,r)) Maj + /■ Let x be a point with sensitivity s(f), and let S C [n] be the set of s(f) sensitive 
coordinates at x. We will pick xo so that d(x, xq) = r + 1 as follows. If r + 1 < s(f), we obtain xo from x 
by flipping some r + 1 coordinates from 5. If r + 1 > s(f), then we obtained xo from / by flipping all the 
coordinates in S, and any r + 1 — s(f) other coordinates T C [n] \ S. The condition r + 1 < n guarantees 
that a subset of the desired size exists, while r + 1 < 2 s(f) enures that m<i5|. 

Since d(x,x o) = r + 1, the value at x is inferred using the Majority rule applied to the neighbors of x in 
B(x o, r). These neighbors are obtained by either flipping coordinates in S or T (where T might be empty). 
The former disagree with /(x) while the latter agree. Since |,S) > |Tj, the Majority rule either labels x 
wrongly, or leaves it undetermined (in the case when r = 2 s(f)). This shows that {f\j 3 ( X0 ,r)) M ^ / f( x )^ 
hence r Ma j > min (2 s(f),n). ■ 

In contrast with Lemma 32, quantifying over all centers xo in the defintion of / Par is crucial for the 
lower bound in Lemma 33. This is seen by considering the n-variable OR function, where the sensitivity is 
n. Applying the Majority rule to a ball of radius 2 around 0 n returns the right function, but if we center the 
ball at l n , then the Majority rule cannot correctly infer the value at 0 n , so it needs to be part of the advice, 
hence r Ma J(OR) = n. 

5.4 Agreement of the Majority and Parity Rule 

Lemmas 32 and 33 can be viewed as alternate characterizations of the degree and sensitivity of a Boolean 
function. The degree versus sensitivity conjecture asserts that both these rules work well (meaning that they 
only require the values on a small ball as advice) for the same class of functions. Given that the rules are so 
simple, and seem so different from each other, we find this assertion surprising. 

In particular. Conjecture 2 is equivalent to the following statement: 

Conjecture 34. There exists constants d \, cfo such that 

r Par (/) < di(r Maj ) d2 . ( 6 ) 
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Along similar lines, one can use Theorem 7, due to [NS94], to show that the Majority rule recovers 
low-degree Boolean functions: 


r Maj (/)<8(r Par (/)) 2 . (7) 

Their proof uses Markov’s inequality from analysis. It might be interesting to find a different proof, which 
one could hope to extend to proving Equation (6) as well. 


6 Conclusion (and more open problems) 

We have presented the first upper bounds on the computational complexity of low sensitivity functions. 
We believe this might be a promising alternative approach to Conjecture 1 as opposed to getting improved 
bounds on specific low level measures like block sensitivity or decision tree depth [KK04, ABG + 14, AS 1 1]. 

Conjecture 1 implies much stronger upper bounds than are given by our results. We list some of the ones 
which might be more approachable given our results: 

1. Every sensitivity s function has a TQj circuit of size n°^ s \ 

2. Every sensitivity s function has a polynomial threshold function (PTF) of degree poly(s). 

3. Every sensitivity s function / has deg 2 (/) < s c for some constant c. 
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A Omitted Proofs and Results 

Proof of Lemma 18: It suffices to prove the bound for sq. The proof is by induction on the dimension n. 
Observe that if pi(f) = 1 then the claim is trivial, so we may assume p\ € (0,1). In the base case n = 1, 
we must have p\ = 1/2, in which case s(f) = 1 so the claim holds. 

For any i £ [n], let j)\ = f\ Xi =i and //q = f\ Xi =o denote the restrictions of / to the subcubes defined 
by Xi. These are each functions on variables in [n\ \ {/}. Then 

= /a (At) + pi(/i,o) 
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If there exists b £ {0,1} such that 0 < n\ (fi,b) < Hi (/) then we can apply the inductive claim to the 
restricted function f >j, to conclude that there exists a point x £ /T 1 (1) so that 

>(/ - l) - log (^)- log (i^I7))' 

If not, it must be that f(x ) = 1 implies Xi = b for some b £ {0,1}, so that 

Hi(fi,b) = ^Hi(f) and Hi{fi,i-b) = 0. 

But then every point x £ f~ 1 (1) is sensitive to x % . Further, we can apply the inductive hypothesis to f jj„ to 
conclude that there exists x £ /■ - , 1 ( 1) such that x is sensitive to 

log (m^i) = log (wti) = log (md) ~ 1 

coordinates from [n] \ {/' }. Since x is also sensitive to i, we have 

sl(/ ' x)21og (FI7))' 

For the final claim, assume the above bound holds with equality. Then there do not exist * £ [n], b £ 
{0,1} such that 0 < Hi(fi,b ) < Hi(f) (if they did exist then we would get a stronger bound). So for every i, 
either Hi(fi,b ) = 0 for some b, or Hiififi) = Hi{fi,i)- die former case, the set / _1 (1) is contained in the 
subcube x t = b. In the latter case, by induction we may assume that f~ 1 ( 1) restricted to both x r = 0 and 
x'i = 1 is a subcube of density exactly //] (/) in {0,1}(" \f ? I, so every point in these subcubes must have 
sensitivity log(l//ii(/)) We further claim that the two subcubes are identical as functions on {0, 1 }M\W. 
If they were not identical, then some point (in each subcube) would be sensitive to coordinate i, but then this 
point would have sensitivity at least log(l//xi (/)) + 1. 

This implies that /” 1 ( 1) is a subcube defined by the equations Xi = 1 — b for all pairs ( i , b ) such that 

Fi(/*,b) = 0. □ 


A.l A Top-Down Algorithm 

Next we describe a “top-down” algorithm for computing f(x) where / is a function of sensitivity s. This 
algorithm has a similar performance bound to our “bottom-up” algorithm described earlier. 

Associate the bit string x £ {0, l} n with the integer z(x) = i and let x < x' if z(x) < z(x'). 
We refer to this as the colex ordering on strings. 

The top-down algorithm also takes the values of / on B( 0 n , 2s) as advice. Given an input x £ {0,1}" 
where we wish to evaluate /, we recursively evaluate / at the first 2s +1 neighbors of x of Hamming weight 
wt(x) — 1 in the colex order. The recursion bottoms out when we reach an input of weight 2s. The restriction 
to small elements in the colex order ensures that the entire set of inputs on which we need to evaluate / is 
small. A detailed description of the algorithm follows: 
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Top-Down 

Advice: / at all points in £>(0 n ,2s). 

Input: x £ {0, l} n . 

1. If wt(x) < 2s or if f{x) has been computed before, return f(x). 

2. Otherwise, let ..., x^s+i be the 2s + 1 smallest elements in 
N(x) of weight wt(x) — 1 in the colex order. If some f(xi) has 
not been computed yet, compute it recursively and store the 
value. 

3. Return f(x) = Maj ie[2a+1] (/(a: i )). 


The key to the analysis is the following le mm a. 

Lemma 35. Let wt(x) = d. For 2s < k < d, the number of weight k vectors x' for which fix') is computed 
by the Top-Down algorithm is bounded by 


(d — k + 2s\ 
\ d-k ) 


< d 2s . 


Proof: Given x £ {0,1}”, for t < wt(x), let R(t) C [n] denote the t largest indices i £ [n] where X{ = 1. 
We claim that all vectors x' with wt(.x y ) = k that are generated by the algorithm are obtained by setting 
d — k indices in R(d — k + 2s) to 0. This claim clearly implies the desired bound. 

The claim is proved by induction on d — k. The case d — k = 1 is easy to see, since the 2s + 1 smallest 
neighbors of x in the colex order (of weight one less than x) are each obtained by setting one of the indices 
in R(2s + 1) to 0. For the inductive case, assume that wt(y) = k, and that y is generated as a neighbor of y' 
with wt(y') = k + 1. Inductively, y' is obtained from x by setting indices in S C R{d — k — 1 + 2s) to 0, 
where |5| = d — k — 1, and hence leaving 2s of them 1. Thus the 2s smallest neighbors of y' are obtained 
by setting indices in R(d — k — 1 + 2s) \ S to 0, and the (2s + l)th neighbor is obtained by setting the 
(d — k + 2s)th 1 from the right to 0. In both cases, we get d — k indices from R(d — k + 2s) being set to 0. 
This completes the induction. ■ 


Theorem 36. The Top-Down algorithm computes f(x) for any input x in time 0{sn 2s+1 ) using space 
0(n 2s+1 ). 

Proof: By the preceding lemma, for an input x of weight d, the total number of x' for which fix') is 
computed and stored is bounded by 


d 

d 2s < d 2s+1 < n 2s+1 . 

k=2s 

The cost of computing / at x given /’s values at the relevant 2s + 1 neighbors of x (see Step 3) is O(s), so 
on average the amortized cost for computing f(x) at each x is bounded by O(s). Hence overall the running 
time and space are bounded by 0(sn 2s+1 ) and 0(n 2s+l ) respectively. ■ 
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